880 research outputs found

    Finger Search in Grammar-Compressed Strings

    Get PDF
    Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index ff, called the \emph{finger}, and the query index ii. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. Let nn be the size the grammar, and let NN be the size of the string. For the static variant we give a linear space representation that supports placing the finger in O(logN)O(\log N) time and subsequently accessing in O(logD)O(\log D) time, where DD is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in O(logN)O(\log N) time and accessing and moving the finger in O(logD+loglogN)O(\log D + \log \log N) time. Compared to the best linear space solution to random access, we improve a O(logN)O(\log N) query bound to O(logD)O(\log D) for the static variant and to O(logD+loglogN)O(\log D + \log \log N) for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars

    Fast Dynamic Arrays

    Get PDF
    We present a highly optimized implementation of tiered vectors, a data structure for maintaining a sequence of nn elements supporting access in time O(1)O(1) and insertion and deletion in time O(nϵ)O(n^\epsilon) for ϵ>0\epsilon > 0 while using o(n)o(n) extra space. We consider several different implementation optimizations in C++ and compare their performance to that of vector and multiset from the standard library on sequences with up to 10810^8 elements. Our fastest implementation uses much less space than multiset while providing speedups of 40×40\times for access operations compared to multiset and speedups of 10.000×10.000\times compared to vector for insertion and deletion operations while being competitive with both data structures for all other operations

    Fast Dynamic Arrays

    Get PDF
    We present a highly optimized implementation of tiered vectors, a data structure for maintaining a sequence of n elements supporting access in time O(1) and insertion and deletion in time O(n^e) for e > 0 while using o(n) extra space. We consider several different implementation optimizations in C++ and compare their performance to that of vector and set from the standard library on sequences with up to 10^8 elements. Our fastest implementation uses much less space than set while providing speedups of 40x for access operations compared to set and speedups of 10.000x compared to vector for insertion and deletion operations while being competitive with both data structures for all other operations

    Compressed Indexing with Signature Grammars

    Get PDF
    The compressed indexing problem is to preprocess a string SS of length nn into a compressed representation that supports pattern matching queries. That is, given a string PP of length mm report all occurrences of PP in SS. We present a data structure that supports pattern matching queries in O(m+occ(lglgn+lgϵz))O(m + occ (\lg\lg n + \lg^\epsilon z)) time using O(zlg(n/z))O(z \lg(n / z)) space where zz is the size of the LZ77 parse of SS and ϵ>0\epsilon > 0 is an arbitrarily small constant, when the alphabet is small or z=O(n1δ)z = O(n^{1 - \delta}) for any constant δ>0\delta > 0. We also present two data structures for the general case; one where the space is increased by O(zlglgz)O(z\lg\lg z), and one where the query time changes from worst-case to expected. These results improve the previously best known solutions. Notably, this is the first data structure that decides if PP occurs in SS in O(m)O(m) time using O(zlg(n/z))O(z\lg(n/z)) space. Our results are mainly obtained by a novel combination of a randomized grammar construction algorithm with well known techniques relating pattern matching to 2D-range reporting

    Optimal-Time Dictionary-Compressed Indexes

    Full text link
    We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result we combine several recent findings, including \emph{string attractors} --- new combinatorial objects encompassing most known compressibility measures for highly repetitive texts ---, and grammars based on \emph{locally-consistent parsing}. More in detail, let γ\gamma be the size of the smallest attractor for a text TT of length nn. The measure γ\gamma is an (asymptotic) lower bound to the size of dictionary compressors based on Lempel--Ziv, context-free grammars, and many others. The smallest known text representations in terms of attractors use space O(γlog(n/γ))O(\gamma\log(n/\gamma)), and our lightest indexes work within the same asymptotic space. Let ϵ>0\epsilon>0 be a suitably small constant fixed at construction time, mm be the pattern length, and occocc be the number of its text occurrences. Our index counts pattern occurrences in O(m+log2+ϵn)O(m+\log^{2+\epsilon}n) time, and locates them in O(m+(occ+1)logϵn)O(m+(occ+1)\log^\epsilon n) time. These times already outperform those of most dictionary-compressed indexes, while obtaining the least asymptotic space for any index searching within O((m+occ)polylogn)O((m+occ)\,\textrm{polylog}\,n) time. Further, by increasing the space to O(γlog(n/γ)logϵn)O(\gamma\log(n/\gamma)\log^\epsilon n), we reduce the locating time to the optimal O(m+occ)O(m+occ), and within O(γlog(n/γ)logn)O(\gamma\log(n/\gamma)\log n) space we can also count in optimal O(m)O(m) time. No dictionary-compressed index had obtained this time before. All our indexes can be constructed in O(n)O(n) space and O(nlogn)O(n\log n) expected time. As a byproduct of independent interest..

    Active megadetachment beneath the western United States

    Get PDF
    Geodetic data, interpreted in light of seismic imaging, seismicity, xenolith studies, and the late Quaternary geologic history of the northern Great Basin, suggest that a subcontinental-scale extensional detachment is localized near the Moho. To first order, seismic yielding in the upper crust at any given latitude in this region occurs via an M7 earthquake every 100 years. Here we develop the hypothesis that since 1996, the region has undergone a cycle of strain accumulation and release similar to “slow slip events” observed on subduction megathrusts, but yielding occurred on a subhorizontal surface 5–10 times larger in the slip direction, and at temperatures >800°C. Net slip was variable, ranging from 5 to 10 mm over most of the region. Strain energy with moment magnitude equivalent to an M7 earthquake was released along this “megadetachment,” primarily between 2000.0 and 2005.5. Slip initiated in late 1998 to mid-1999 in northeastern Nevada and is best expressed in late 2003 during a magma injection event at Moho depth beneath the Sierra Nevada, accompanied by more rapid eastward relative displacement across the entire region. The event ended in the east at 2004.0 and in the remainder of the network at about 2005.5. Strain energy thus appears to have been transmitted from the Cordilleran interior toward the plate boundary, from high gravitational potential to low, via yielding on the megadetachment. The size and kinematic function of the proposed structure, in light of various proxies for lithospheric thickness, imply that the subcrustal lithosphere beneath Nevada is a strong, thin plate, even though it resides in a high heat flow tectonic regime. A strong lowermost crust and upper mantle is consistent with patterns of postseismic relaxation in the southern Great Basin, deformation microstructures and low water content in dunite xenoliths in young lavas in central Nevada, and high-temperature microstructures in analog surface exposures of deformed lower crust. Large-scale decoupling between crust and upper mantle is consistent with the broad distribution of strain in the upper crust versus the more localized distribution in the subcrustal lithosphere, as inferred by such proxies as low P wave velocity and mafic magmatism
    corecore